[SPARK-56914][SQL] Simplify decimal arithmetic codegen under ANSI mode by gengliangwang · Pull Request #55939 · apache/spark

gengliangwang · 2026-05-17T23:30:46Z

Title: [SPARK-56914][SQL] Refactor decimal arithmetic codegen under ANSI mode
Base: master (stacked on PR 3 - SPARK-56911)
Head: gengliangwang:SPARK-56914-decimal-arithmetic

What changes were proposed in this pull request?

Use CastUtils.changePrecisionExact / changePrecisionOrNull (added in SPARK-56911) from the DecimalType.Fixed branches of:

BinaryArithmetic.doGenCode (covers Add / Subtract / Multiply on Decimal).
BinaryDivModLike.doGenCode (covers Divide / IntegralDivide / Remainder / Pmod on Decimal).
BinaryArithmetic.checkDecimalOverflow (eval path used by both groups via numeric.plus/minus/times/div).

Each call site goes from eval1.$op(eval2).toPrecision(p, s, ROUND_HALF_UP, !failOnError, ctx) + a 4-line null check to a single CastUtils.changePrecision{Exact,OrNull} call.

Why are the changes needed?

Part of SPARK-56908 (umbrella). Decimal arithmetic is widespread in TPC-DS plans, and the BinaryArithmetic Decimal branch was one of the longer ANSI codegen bodies still emitted inline.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

build/sbt "catalyst/testOnly *ArithmeticExpressionSuite *DecimalSuite"

60/60 pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.x

gengliangwang · 2026-05-17T23:32:32Z

Stack overview (SPARK-56908 umbrella)

This PR is part of a stack of 8 PRs against SPARK-56908. Order:

[SPARK-56909][SQL] Simplify Cast to int/long codegen under ANSI mode #55934 — [SPARK-56909][SQL] Simplify Cast to int/long codegen under ANSI mode (this stack base)
[SPARK-56910][SQL] Simplify Cast to byte/short codegen under ANSI mode #55935 — [SPARK-56910][SQL] Simplify Cast to byte/short codegen under ANSI mode
[SPARK-56911][SQL] Simplify Cast to decimal codegen under ANSI mode #55936 — [SPARK-56911][SQL] Simplify Cast to decimal codegen under ANSI mode
[SPARK-56912][SQL] Simplify Cast to boolean codegen under ANSI mode #55937 — [SPARK-56912][SQL] Simplify Cast to boolean codegen under ANSI mode
[SPARK-56914][SQL] Simplify decimal arithmetic codegen under ANSI mode #55939 — [SPARK-56914][SQL] Simplify decimal arithmetic codegen under ANSI mode (depends on [SPARK-56911][SQL] Simplify Cast to decimal codegen under ANSI mode #55936)
[SPARK-56913][SQL] Simplify BinaryArithmetic byte/short codegen under ANSI mode #55938 — [SPARK-56913][SQL] Simplify BinaryArithmetic byte/short codegen under ANSI mode (independent)
[SPARK-56915][SQL] Simplify MakeDate/MakeInterval codegen under ANSI mode #55940 — [SPARK-56915][SQL] Simplify MakeDate/MakeInterval codegen under ANSI mode (independent)
[SPARK-56916][SQL] Simplify ElementAt array codegen under ANSI mode #55941 — [SPARK-56916][SQL] Simplify ElementAt array codegen under ANSI mode (independent)

PRs 1-4 are linearly stacked on each other (each branch is based on the previous one). PR 5 (decimal arithmetic) is stacked on top of PR 3 (cast decimal) since it uses CastUtils.changePrecisionExact. PRs 6, 7, 8 branch off master independently.

### What changes were proposed in this pull request? Introduce `CastUtils.java` and use it from `Cast.scala` to collapse the multi-line ANSI overflow-check codegen for casts that target `int` and `long` into one-line static-method calls. Source and target `DataType` constants used in the overflow error message live as `private static final` fields on the helper class, so the happy path performs no per-row `references[]` lookups. Helpers added: * `longToIntExact(long)` for narrowing `long -> int`. * `floatToIntExact(float)`, `doubleToIntExact(double)` for fractional -> int. * `floatToLongExact(float)`, `doubleToLongExact(double)` for fractional -> long. `Cast.scala` changes: * `castIntegralTypeToIntegralTypeExactCode` and `castFractionToIntegralTypeCode` dispatch on the target type: `int` (and `long` for the fraction case) emit a `CastUtils.<...>Exact` call; byte/short targets keep the inline body (refactored in SPARK-56910). * Eval paths for `castToInt` add ANSI `LongType` / `FloatType` / `DoubleType` cases, and `castToLong` adds `FloatType` / `DoubleType` cases, both delegating to the new helpers. ### Why are the changes needed? Part of SPARK-56908. The current ANSI cast codegen emits 5-line inline overflow blocks per call site. Multiplied across the many cast paths in a TPC-DS plan, this contributes meaningfully to the generated source size and to Janino compile time, and pushes whole-stage methods closer to the 64KB JVM method limit. ### Does this PR introduce _any_ user-facing change? No. The compiled behavior is identical; only the emitted Java source text changes. ### How was this patch tested? `build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite *ExpressionClassIdentitySuite"` — 312/312 pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 1.x

Extend `CastUtils.java` with helpers for `byte` and `short` ANSI cast targets and use them from `Cast.scala`. Drops the byte/short-target dispatch (and the now-unused `lowerAndUpperBound` Scala helper) added in SPARK-56909 -- after this PR, all integral and fractional narrowing ANSI casts share the same `CastUtils.<...>Exact` one-line codegen. Helpers added: * `shortToByteExact(short)`, `intToByteExact(int)`, `longToByteExact(long)` * `intToShortExact(int)`, `longToShortExact(long)` * `floatToByteExact(float)`, `doubleToByteExact(double)` * `floatToShortExact(float)`, `doubleToShortExact(double)` `Cast.scala` changes: * `castIntegralTypeToIntegralTypeExactCode` / `castFractionToIntegralTypeCode` no longer dispatch on target type -- the helper-name pattern `${integralPrefix(from)}To${target.capitalize}Exact` covers all four target types. * Eval paths for `castToByte` and `castToShort` add ANSI cases for `ShortType` / `IntegerType` / `LongType` / `FloatType` / `DoubleType` source types that delegate to the new helpers; the existing `exactNumeric.toInt(b) + bounds-check` fallback now only handles the remaining `Decimal` source. Part of SPARK-56908 (umbrella). The original byte/short ANSI cast bodies were 5 lines each across 8 call sites; this PR collapses them to one line per call site, matching the int/long target work from SPARK-56909. No. The compiled behavior is identical; only the emitted Java source text changes. ``` build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite \ *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite \ *ExpressionClassIdentitySuite" ``` 312/312 pass. Generated-by: Cursor 1.x

### What changes were proposed in this pull request? Extend `CastUtils.java` with two helpers for decimal precision adjustment and use them from `Cast.changePrecision` (both the eval and codegen implementations). The new helpers mutate the input `Decimal` in place (matching the behavior of the existing inline codegen), so they're safe to call on the temporary produced by `Decimal.fromString(...)` / `Decimal.apply(...)` / decimal-arithmetic results. Helpers added: * `changePrecisionExact(Decimal, int, int, QueryContext)`: ANSI throw on overflow, preserves the per-call-site `QueryContext` so error messages keep their query-origin info. * `changePrecisionOrNull(Decimal, int, int)`: non-ANSI, returns `null` on overflow (no `QueryContext` needed). `Cast.scala` changes: * `changePrecision` eval method dispatches on `nullOnOverflow` and delegates to the appropriate helper. * `changePrecision` codegen method has three branches now: the existing `canNullSafeCast` fast path (unchanged), a `nullOnOverflow` branch (inline), and the ANSI throw branch which now emits a one-line `CastUtils.changePrecisionExact(...)` call instead of the 5-line `if/else` overflow block. ### Why are the changes needed? Part of SPARK-56908 (umbrella). The ANSI throw branch of `Cast.changePrecision` is hit by every cast to decimal that may overflow (very common in TPC-DS, where `cast(int as decimal(7,2))` is widespread). Collapsing the 5-line inline body to one line shrinks the generated Java source for those plans. ### Does this PR introduce _any_ user-facing change? No. The compiled behavior is identical; only the emitted Java source text changes. ### How was this patch tested? ``` build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite \ *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite *DecimalSuite \ *ExpressionClassIdentitySuite" ``` 337/337 pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 1.x

### What changes were proposed in this pull request? Use `CastUtils.changePrecisionExact` / `changePrecisionOrNull` (added in SPARK-56911) from the `DecimalType.Fixed` branches of: * `BinaryArithmetic.doGenCode` (covers `Add` / `Subtract` / `Multiply` on `Decimal`). * `BinaryDivModLike.doGenCode` (covers `Divide` / `IntegralDivide` / `Remainder` / `Pmod` on `Decimal`). * `BinaryArithmetic.checkDecimalOverflow` (eval path used by both groups via `numeric.plus`/`minus`/`times`/`div`). Each call site goes from `eval1.$op(eval2).toPrecision(p, s, ROUND_HALF_UP, !failOnError, ctx)` + a 4-line null check to a single `CastUtils.changePrecision{Exact,OrNull}` call. ### Why are the changes needed? Part of SPARK-56908 (umbrella). Decimal arithmetic is widespread in TPC-DS plans, and the `BinaryArithmetic` Decimal branch was one of the longer ANSI codegen bodies still emitted inline. ### Does this PR introduce _any_ user-facing change? No. The compiled behavior is identical; only the emitted Java source text changes. ### How was this patch tested? ``` build/sbt "catalyst/testOnly *ArithmeticExpressionSuite *DecimalSuite" ``` 60/60 pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 1.x

This was referenced May 17, 2026

[SPARK-56913][SQL] Simplify BinaryArithmetic byte/short codegen under ANSI mode #55938

Open

[SPARK-56915][SQL] Simplify MakeDate/MakeInterval codegen under ANSI mode #55940

Open

[SPARK-56916][SQL] Simplify ElementAt array codegen under ANSI mode #55941

Open

gengliangwang force-pushed the SPARK-56914-decimal-arithmetic branch 2 times, most recently from 883cea1 to ee5c3fb Compare May 17, 2026 23:35

gengliangwang requested review from cloud-fan and viirya May 17, 2026 23:39

gengliangwang marked this pull request as draft May 17, 2026 23:39

gengliangwang added 4 commits May 18, 2026 17:06

gengliangwang force-pushed the SPARK-56914-decimal-arithmetic branch from ee5c3fb to 2a324d8 Compare May 18, 2026 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56914][SQL] Simplify decimal arithmetic codegen under ANSI mode#55939

[SPARK-56914][SQL] Simplify decimal arithmetic codegen under ANSI mode#55939
gengliangwang wants to merge 4 commits into
apache:masterfrom
gengliangwang:SPARK-56914-decimal-arithmetic

gengliangwang commented May 17, 2026

Uh oh!

gengliangwang commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gengliangwang commented May 17, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

gengliangwang commented May 17, 2026

Stack overview (SPARK-56908 umbrella)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant